Back

PLOS Biology

Public Library of Science (PLoS)

Preprints posted in the last 7 days, ranked by how well they match PLOS Biology's content profile, based on 408 papers previously published here. The average preprint has a 0.58% match score for this journal, so anything above that is already an above-average fit.

1
Disentangling Confounders from Pathology in Long-COVID Trajectory Prediction for Women: An Interpretable Large-Language-Model Approach

Wang, J.; Galis, Z.; Zhang, T.; Luo, Y.; Sra, A.; Niu, X.; Shen, J.; Xie, Q.; Weiss, J. C.

2026-06-12 infectious diseases 10.64898/2026.06.10.26355420 medRxiv
Top 2%
4.8%
Show abstract

Objective. Post-acute sequelae of SARS-CoV-2 infection (PASC, "Long COVID") dispropor- tionately affects women, in whom hallmark symptoms--insomnia, fatigue, palpitations, cogni- tive difficulty--overlap with comorbidities and hormonal transitions such as menopause. This diagnostic overlap is a confounding problem: models that forecast future symptom severity risk attributing baseline physiological noise to viral pathology. We ask whether an interpretable, causally disentangled language model can separate true pathological signal from such con- founders while remaining competitive with strong predictors of future PASC severity

2
Metatranscriptomics-Derived Disease Risk Scores as a Preventive, Diagnostic, and Treatment Support Tool

Hu, L.; Bass, M.; Patridge, E.; Molusky, M.; Antoine, G.; Vuyisich, M.; Banavar, G.

2026-06-06 genetic and genomic medicine 10.64898/2026.05.29.26354333 medRxiv
Top 11%
1.5%
Show abstract

Background: Chronic diseases and symptom syndromes often develop after prolonged biological changes that may precede formal diagnosis. RNA-based metatranscriptomics captures active microbial and human gene expression and may provide a functional layer for disease risk evaluation. To address this translational gap, we developed and validated a Disease Risk Score (DRS) framework that integrates metatranscriptome-derived pathway activity scores from stool, saliva, and blood samples, and evaluated its potential clinical utility as an adjunct risk-evaluation tool. Methods: DRS uses disease-specific sets of pathway activity scores derived from stool and saliva microbial functions, stool and saliva microbial taxa, and blood human gene expression. For each disease, 'not optimal' pathway scores are aggregated into a normalized cumulative odds ratio, or cOR, using score-level odds ratios, statistical significance, and literature-supported biological relevance derived from a Development Cohort of 22,369 individuals. A cOR [≥] 5 is defined as high risk. Performance is evaluated in an independent Validation Cohort of 15,908 individuals using self-reported diseases as the reference. Disease support requires both significant cOR separation between self-reported and not-reported (Cohen's d [≥] 0.2) and risk ratio enrichment of self-reported disease among individuals classified as high risk (95% CI of Risk Ratio > 1). Results: Of 20 initially evaluated diseases, 15 meet the prespecified validation criteria on the independent validation cohort: ADHD, anxiety, chronic fatigue syndrome, depression, GERD, hypertension, inflammatory bowel disease, IBS-C, IBS-D, insomnia, MASLD, obesity, obstructive sleep apnea, Sjogren's syndrome, and type 2 diabetes. Five selected clinical scenarios illustrate how DRS can support clinician-mediated decision making, including IBS subtype reclassification, improved diagnostic acceptance in IBS-D, personalized lifestyle counseling in MASLD and early type 2 diabetes, and diagnostic uncertainty in atypical GERD. Conclusions: DRS is a metatranscriptomics-based risk-stratification framework that aggregates active microbial and human pathway signals into interpretable disease-specific risk estimates across a wide range of disease conditions. Validation against self-reported disease labels in an independent cohort shows significant risk enrichment for each of 15 diseases. DRS is intended as an adjunct to clinical evaluation: a decision support tool in situations where routine care encounters uncertainty, delay, or low patient engagement. Future prospective studies using clinically adjudicated endpoints are needed to assess calibration and clinical outcomes.

3
Daily symptom monitoring is sustainable over months: retention, not compliance, is the primary barrier to long-duration digital tracking

Gunsilius, C. Z.; Pei, P.; Carayannopoulos, A.; Petzschner, F. H.

2026-06-10 rehabilitation medicine and physical therapy 10.64898/2026.06.08.26355180 medRxiv
Top 15%
1.1%
Show abstract

Ecological momentary assessment (EMA) enables real-time, longitudinal measurement of symptoms and behavior via smartphones, yet nearly all feasibility evidence comes from protocols lasting one to two weeks, far shorter than the timescales over which chronic diseases fluctuate and clinical decisions unfold. Whether daily compliance can be sustained over months, or whether it decays as short-protocol trends predict, is unknown. Here, 214 participants (173 with pain, 41 healthy controls) completed a 4-month (122-day) EMA protocol via the Soma smartphone app, generating 26,907 check-ins. Half the sample completed the full protocol without a two-week lapse. Aggregate compliance appeared moderate (50%), but this conflated two distinct phenomena: when recomputed over each participant's active period, compliance rose to 71%, with 91% achieving moderate-to-high adherence, and remained stable across all 17 study weeks. Pain status predicted earlier disengagement but not lower compliance among those who remained; after adjustment for differential retention, group differences disappeared. To our knowledge, this is the longest continuous daily EMA evaluation in a clinical population. It suggests the primary barrier to long-duration EMA is not declining motivation among active participants but concentrated early disengagement, with direct implications for the design of digital health protocols, decentralized trials, and remote symptom monitoring.

4
Computer Vision for Real-Time Anatomical Navigation in Neurosurgery: First-in-Human Clinical Evaluation and Iterative Development (IDEAL Stage 1)

Khan, D. Z.; Mao, Z.; Wijekoon, A.; Das, A.; Williams, S. C.; Blandford, A.; Jain, A.; Harris, L.; Borg, A.; Dorward, N. L.; Clarkson, M.; Bano, S.; McCulloch, P.; Stoyanov, D.; Marcus, H.

2026-06-11 surgery 10.64898/2026.06.11.26355205 medRxiv
Top 22%
0.7%
Show abstract

Introduction: Precise anatomical navigation is fundamental to safe endoscopic pituitary surgery, a high-stakes procedure characterised by a challenging learning curve. While traditional navigation systems often rely on workflow-disrupting probes or static preoperative imaging, advancements in computer vision AI (CVAI) now enable dynamic, real-time anatomical segmentation directly from live surgical video1-3. Our group has previously conducted a series of preclinical human-computer interaction studies to refine the system's design, alongside digital and high-fidelity physical simulations demonstrating the benefit of AI assistance in improving overall performance, training, and safety4-8. Building on this foundation, the current study represents a first-in-human application of real-time CVAI assistance in the neurosurgical operating room, serving to assess feasibility and safety, and to iteratively improve the system. Method: Guided by DECIDE-AI and IDEAL frameworks, this single-centre evaluation comprises an initial proof-of-concept phase (n=6) for endoscopic transsphenoidal pituitary surgeries. The AI model utilised a DINOv3-derived vision transformer architecture, deployed via a high-performance edge computing unit to achieve low-latency, real-time inference without reliance on cloud infrastructure2. Given the high-risk nature of the procedure and the early stage of clinical AI integration, the system was initially deployed as an educational adjunct on a secondary monitor, ensuring the primary surgical feed remains uncompromised. Functionality and safety were assessed via structured questionnaire, prospective observation, and blinded retrospective review of the recordings of the endoscopic surgical video feed and wider operating room environment. Continuous multi-stakeholder feedback through validated human factors surveys drove iterative technical refinements between cases. Results: Six patients with pituitary adenomas were enrolled. The CVAI system was successfully deployed in four cases, demonstrating acceptable real-time sella segmentation accuracy. Deployment failed pre-operatively in two cases owing to a single recurring system reboot bug. Iterative refinement between cases were driven by our experience and surgical team feedback. This resulted in the integration of additional anatomical structure segmentations (e.g., carotid arteries), enhanced model accuracy via training dataset expansion, and hardware firmware upgrades. Multi-stakeholder surveys demonstrated satisfactory system feasibility, usability, and acceptability among the surgical team. Both prospective observation and retrospective video review confirmed the absence of adverse events, including no significant distraction to the primary surgeon, and there were no AI-related clinical complications. Conclusion: This first-in-human early clinical evaluation demonstrates the feasibility, safety and iterative development of real-time, CVAI-based anatomical navigation during high-stakes neurosurgery. Future work will include a larger single-centre case series (IDEAL Stage 2a) with more surgical teams to further iterate the system and explore its impact on training and workflow. As the underpinning technology improves, deployment will transition to direct intra-operative decision support and integration with other intra-operative navigational technologies.

5
Genetic Susceptibility to Incisional Hernia: Evaluation of Hernia Polygenic Risk Scores

Pregnall, A. M.; Hornick, M. M.; Broach, R. B.; Judy, R.; DePaolo, J.; Yuan, S.; Levin, M.; Fischer, J. P.; Damrauer, S. M.; Wachtel, H.

2026-06-11 genetic and genomic medicine 10.64898/2026.06.10.26355374 medRxiv
Top 22%
0.7%
Show abstract

Objectives: Incisional hernia (IH) affects 13-30% of people after abdominal surgery, resulting in substantial morbidity and costs. While clinical risk factors have been studied extensively, genomic risk for IH is incompletely understood. We aimed to evaluate the impact of polygenic risk scores (PRS) on IH risk prediction. Methods] We created and evaluated three PRS for abdominal hernia, ventral hernia and latent hernia susceptibility for prediction of IH in an institutional biobank. The primary outcome was defined as the diagnosis or repair of an IH based on ICD-9/10-CM/PCS and CPT codes. Clinical covariates included age, sex, body mass index (BMI), smoking status, index procedure type, and perioperative surgical site infection. A phenome-wide association study (PheWAS) was performed to assess clinical associations with increased PRS. We then tested the ability of the PRS to improve prediction for IH by modeling clinical covariates with and without PRS in patients who underwent abdominal surgery. Model performance was assessed using 10 iterations of 5-fold cross-validation to estimate Brier scores and area under the receiver operating characteristic curve (AUROC), which were compared using cross-model Bayesian analysis of variance. Results: In 55,809 subjects, assessed PRS was significantly associated with incisional, umbilical, and ventral hernia on PheWAS, with 1.19 greater odds of developing IH per 1-SD increase in PRS (95% CI: 1.13-1.25, P \< 0.001). Of 9,909 subjects who underwent qualifying abdominal surgery, 706 developed IH. In this cohort, the latent hernia susceptibility PRS was associated with a 16% increased hazard of developing IH per 1-SD increase (HR 1.16; 95% CI: 1.07-1.26; P \< 0.001). Compared to a predictive model using clinical covariates (Brier score = 0.047, 95% CI: 0.046-0.048; AUROC = 0.660, 95% CI: 0.653-0.666), addition of the PRS showed similar Brier score and AUROC estimates (Brier score = 0.047, 95% CI: 0.046-0.048; AUROC: 0.667, 95% CI: 0.661-0.673) at five years. Cross-model Bayesian analysis demonstrated \>99% probability of practical equivalence when trying to detect a difference of [&ge;] 0.02. Conclusion: All three PRS for hernia were independently associated with IH, suggesting that genomic factors contribute significantly to IH development. However, none of the three PRS meaningfully improved clinical IH risk prediction in patients who underwent abdominal surgery. This suggests that clinical comorbidities and surgical techniques may be equally as important as genomic architecture.

6
Immune Biomarker Signatures as Predictors of Functional and Pain Recovery After Total Knee Arthroplasty in Older Adults

Kraus, V. B.; Greenberg, N. D.; Ashner, M.; Huebner, J. L.; Bareja, A.; Peskoe, S.; Simon, C.; Whitson, H. E.; Colon-Emeric, C. S.

2026-06-10 geriatric medicine 10.64898/2026.06.08.26355189 medRxiv
Top 22%
0.7%
Show abstract

Postoperative resilience varies widely among older adults, yet the biological drivers of recovery remain unclear. We evaluated whether preoperative immune profiles, measured in plasma and through ex vivo whole blood stimulation, predict resilience to the acute stress of total knee arthroplasty. A total of 152 adults (greater or equal to 60 years) in the PRIME KNEE cohort underwent elective total knee arthroplasty and had available blood samples for measurement of 45 immune biomarkers, quantified in plasma and in whole blood stimulated ex vivo for 24 hours with lipopolysaccharide (LPS) or influenza antigen (FLU). Resilience was assessed using Expected Recovery Differential (ERD) and Resilience Trajectory (RT) across pain severity, pain interference, lower extremity physical activities of daily living (LE PADLs), and step counts. An exploratory stability selection framework using LASSO identified biomarker predictors of postoperative outcomes. Plasma and stimulated biomarkers showed broadly similar predictive performance. A shared set of biomarkers, including LBP, leptin, TNFR1, CD30, and LIF, was consistently selected across models. Immune predictors explained ~12-24% of the variance in resilience outcomes. Distinct immune signatures emerged for pain versus functional recovery: pain related predictors mapped to local inflammatory and neuroimmune pathways, whereas function related predictors reflected systemic inflammatory load and cytokine signaling. Preoperative immune biomarkers, whether measured in plasma or after ex vivo stimulation, capture meaningful variance in postoperative resilience. The divergence between pain related and function related immune signatures highlights biologically distinct pathways underlying different dimensions of recovery and supports further development of immune based perioperative risk assessment.

7
Heart Rate Circadian Oscillations as Digital Biomarkers of Cardiometabolic Health Determinants

Colitta, A.; Bruno, S.; Benedetti, D.; Hoxhaj, D.; Cruz-Sanabria, F.; Di Pede, C.; Buracchi Torresi, F.; Frumento, P.; Gargani, L.; Fabbrini, M.; Maestri Tassoni, M.; Bonanni, E.; Faraguna, U.

2026-06-10 cardiovascular medicine 10.64898/2026.06.07.26355124 medRxiv
Top 22%
0.7%
Show abstract

AIMS Cardiometabolic risk factors may impair health by altering the autonomic modulation of the cardiovascular system, a physiological process described by heart rate (HR) circadian oscillations. However, the impact of cardiometabolic health determinants on HR circadian oscillations remains scarcely characterized in real-world, population-based settings. To address this, we applied digital health technologies to investigate how cardiometabolic health determinants shape HR circadian oscillations in a real-world cohort of individuals free of cardiometabolic diseases. METHODS First, a 10-fold cross-validation of a model was performed, aiming at mitigating wearables measurement error caused by motion artifacts. This process was informed by 10,056 epochs of concurrent wearable-derived and polysomnographic HR assessment, yielding an average 1.3 bpm reduction in wearables measurement error. We subsequently applied this model to over 2 million 1-minute epochs of HR data, derived from 7-day continuous actigraphic recordings of 245 individuals free of cardiometabolic disorders. Functional-on-scalar regression modelling and both parametric and nonparametric analyses characterized HR circadian profiles and their relationships with demographics, lifestyle, chronotype, sleep health, and chronic insomnia diagnosis. A 6-dimension sleep health index was calculated. RESULTS Sex, chronotype, and sleep health predominantly shaped HR circadian oscillations. In detail, females consistently showed higher HR across the 24 hours. Moreover, chronotype was associated to a phase shift in HR circadian profiles, with later timings corresponding to eveningness. Notably, sleep health impacted HR circadian oscillations in a dose-dependent fashion: each additional impaired sleep dimension was associated with a 1.2 bpm HR increase during nighttime, alongside reduced circadian robustness and delayed oscillation timings. Finally, the earlier occurrence of morning HR peaks served as a digital biomarker of insomnia (80% specificity, 74% sensitivity). CONCLUSIONS This work provides a digital health framework to characterize HR circadian oscillations in free-living populations and supports its clinical utility in capturing the autonomic disruptions related to cardiometabolic health determinants.

8
Estimating COVID-19 Cumulative Incidence from Seroprevalence Surveys accounting for Time-Varying Seroreversion: A Fully Bayesian Methodology

Owusu-Boaitey, N.; Meyer, M. J.; Herrera-Esposito, D.; Bottcher, L.; Lukz, M.; Cook, S.; Stoto, M. A.; Kraemer, J. D.

2026-06-10 epidemiology 10.64898/2026.06.09.26355264 medRxiv
Top 25%
0.5%
Show abstract

Seroprevalence surveys reveal the extent of humoral immunity against pathogens such as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), and under some circumstances represent cumulative incidence of prior infection. However, antibody waning - or seroreversion - biases these estimates by reducing assay sensitivity in a time-varying manner. Because assay sensitivity decays over time, naively using serosurveys can substantially bias estimates of SARS-CoV-2 cumulative incidence and fatality rates. The Bayesian assay-specific, time-varying sensitivity adjustment developed in this paper can reliably correct for this bias and account for the delay between infection and serosurvey. In seroprevalence studies conducted in the United States in 2020, adjusting for time-varying sensitivity increased cumulative incidence by up to 1.4-fold, with an adjustment of 1.08 for a national study. Our estimates contrast with a previously published 2-fold adjustment that did not account for assay design. This suggests that previous analyses overestimated cumulative incidence by applying seroreversion corrections that did not account for assay-specific effects, or underestimated cumulative incidence by not applying seroreversion corrections. These biases imply fatality rate underestimation and overestimation, respectively. Our model provides a framework for design-specific time-varying sensitivity corrections in seroprevalence surveys for other pathogens.

9
Stochastic Morphodynamics of the Human Aorta Across the Lifespan

Twohig, K. C.; Mansour, M.; Pugar, J. A.; Yuan, K.; Pocivavsek, L.; Klishin, A. A.

2026-06-08 surgery 10.64898/2026.06.05.26355015 medRxiv
Top 25%
0.5%
Show abstract

Biological systems evolve as continuous dynamical processes, but at organ-scale and across human lifespans they are rarely observed longitudinally--population data typically exist instead as sparse, cross-sectional snapshots. Inferring lifespan dynamics from such data requires methods distinct from those used at cellular and tissue scales where dense observations are accessible. We address this problem in the thoracic aorta, where surgical decisions currently rest on static, age- and sex-agnostic diameter thresholds that reduce three-dimensional morphology to a single scalar. Treating normal aortic morphology as a stochastic dynamical system, we pose a continuous-time drift-diffusion process in a two-coordinate state space of normalized surface area (A) and normalized fluctuation in integrated Gaussian curvature ({delta} K), and fit closed-form solutions of the Fokker-Planck equation by maximum likelihood to a sex-balanced, age-uniform cohort spanning infancy to age 99. Inter-individual variability is treated as a fitted diffusion parameter rather than as residual scatter, which is distinct from prior normative studies that report variability as scatter around a regression line. The framework identifies two growth regimes for aortic size (childhood expansion followed by persistent adult growth, with adult males growing approximately 70% faster than adult females) and a single dynamical regime for aortic shape, with heteroscedastic variability accumulating at a rate comparable to the mean drift over the lifespan. Applied to independent cohorts of acute and chronic thoracic aortic dissections, the multivariate model identifies over 95% as statistical outliers via Mahalanobis distance, consistently outperforming either coordinate alone. The same probabilistic envelope that describes normal aging thus defines a baseline against which disease can be detected, supporting a shift toward dynamic, age- and sex-aware assessment of thoracic aortic pathology.

10
Serological thresholds of risk reduction for infant group B streptococcus disease

Cantrell, L.; Karampatsas, K.; Andrews, N.; Beach, S.; Bentley, E.; Berardi, A.; Bijlsma, M. W.; Cagil Kocana, C.; Daniel, O.; French, N.; Hall, T.; Izu, A.; Khalil, A.; Kwatra, G.; Kyohere, M.; Madhi, S. A.; Mboizi, R.; Miselli, F.; Nielsen, M.; Thorn, N.; van de Beek, D.; Walker, K.; Heath, P. T.; Le Doare, K.; Voysey, M.; PREPARE WP3 Study Group,

2026-06-06 epidemiology 10.64898/2026.05.29.26353453 medRxiv
Top 25%
0.5%
Show abstract

Vaccines to prevent infant group B streptococcus (GBS) disease are advancing, with licensure likely based on safety and immunologic endpoints rather than clinical efficacy data. This approach requires robust, generalisable serological thresholds of risk reduction (SToRRs). We combined data from six case-control studies in Europe and Africa to define SToRRs for early-onset (EOD) and late-onset (LOD) GBS disease. Across diverse epidemiological and healthcare settings, anti-capsular polysaccharide IgG concentrations were consistently higher in infants who remained disease free than in those who developed disease. Higher antibody concentrations were required to reduce the risk of EOD than LOD, and higher concentrations were required for serotype Ia than for serotype III. This study provides a quantitative framework to support correlates-based evaluation and potential licensure of maternal GBS vaccines.

11
Immunologically Optimized Zmp1 Peptides Reveal a Translational Serological Biomarker Platform for Tuberculosis Diagnosis Across Disease Manifestations

Zade, O. S.; Yandrapally, S.; Choudhari, K.; Gaikwad, A. V.; Panda, R.; Neela, V. S. K.; Devalraju, K. P.; Eedara, R. V. V.; Ansari, M. S.; Chandrashekhar, C.; Sriram, D.; Mohareer, K.; Valluri, V. L.; Somvanshi, P. R.; Banerjee, S.

2026-06-12 infectious diseases 10.64898/2026.06.11.26355355 medRxiv
Top 27%
0.5%
Show abstract

Tuberculosis (TB) diagnosis remains challenging, particularly for extrapulmonary TB (EPTB), where invasive sampling, low bacillary burden, and suboptimal sensitivity of nucleic acid-based tests in peripheral specimens hinder timely detection. Here, we report an immunology-driven strategy for biomarker discovery and development of a peptide-based serological assay targeting Mycobacterium tuberculosis zinc metalloprotease-1 (Zmp1). Leveraging fundamental principles of adaptive immunity that antigenic regions containing overlapping B-cell and CD4 T-helper cell epitopes would preferentially generate high antibody titers through linked recognition and cognate T-cell help, we used an immunoinformatics pipeline to identify two nested immunodominant peptide regions within Zmp1 (Mtb-Zp-NT and Mtb-Zp-CT) enriched for overlapping B- and T-cell epitopes. The diagnostic potential of these peptides was evaluated through ELISA-based serological assays. A blinded pilot study (N=137) demonstrated a clear discrimination between active TB and TB-recovered individuals. The assay was subsequently validated in an expanded cohort (N=875) by screening 6,086 individuals, which identified 457 TB-positive cases. The cohort included pulmonary TB (PTB), EPTB, TB-recovered individuals, household contacts, non-specific infections, and healthy controls. Receiver operating characteristic analyses, supported by DeLong and bootstrap comparisons, revealed superior diagnostic performance of the peptide-based assays relative to full-length Zmp1. Mtb-Zp-CT exhibited the highest accuracy (AUC=0.93; specificity >90%), while Mtb-Zp-NT also demonstrated strong discriminatory power (AUC{approx}0.89). These findings establish that the immunologically optimized Zmp1 peptides are highly promising serological biomarkers for TB and EPTB. More broadly, they demonstrate how mechanistically informed epitope selection can accelerate translation of pathogen-specific immune signatures into sensitive, minimally invasive, and potentially point-of-care diagnostic platforms for resource-limited settings.

12
Conversational Speech for Respiratory Triage in Primary Care: A Pilot Study

Ravi, V.; Noufi, C.

2026-06-11 respiratory medicine 10.64898/2026.06.09.26355284 medRxiv
Top 28%
0.4%
Show abstract

Background. Respiratory complaints account for a substantial share of adult ambulatory care visits, and triaging them accurately has direct consequences for antibiotic stewardship and pathogen-specific therapy. Prior work has investigated voice as a triage signal, but that literature is dominated by single-condition detection from scripted speech in crowdsourced or controlled clinical settings and has not been evaluated at primary care scale on conversational ambient audio. Methods. A dataset of 514,377 ambient-recorded primary care visits from 379,225 adult patients at a US clinic network was used, with per-visit clinically assigned ICD-10 diagnosis codes and de-identified demographic and geographic metadata. Patient audio was extracted from each doctor-patient conversation, and spectral, voice quality, and prosodic features were computed. Eleven binary classification tasks were defined, aligned with a respiratory triage cascade (e.g., acute respiratory versus acute non-respiratory illness, and lower versus upper respiratory tract infection). An acoustic model (feed-forward network) was trained independently for each task using patient-stratified five-fold cross-validation and evaluated on a held-out test set. Each task's model was also compared against six non-acoustic baselines using a single demographic, geographic, or temporal variable. The 11 trained classifiers were composed into a hierarchical cascade and illustrated as case studies on selected patients. Results. Test-set AUC across the 11 tasks ranged from 0.602 (95% CI: 0.588-0.614) to 0.745 (95% CI: 0.742-0.748), with a mean expected calibration error of 0.018. Six of eleven binaries outperformed all confounder baselines. Four binaries showed median within-stratum AUC of 0.62-0.70 when the confounder was held fixed, indicating acoustic discrimination beyond what the confounder alone explains. The exception was the pneumonia versus non-pneumonia lower respiratory tract infection binary, which failed against the patient-city confounder baseline, plausibly reflecting a clinic-level difference in ICD-10 coding. Conclusion. Conversational primary care audio carries acoustic signal that discriminates clinically meaningful respiratory contrasts. Absolute performance is moderate, but the conditions are stricter than prior work: conversational speech and differential-diagnosis contrasts among sick patients. This pilot study is a baseline for voice-based clinical AI moving beyond sick-versus-healthy detection toward differential-diagnosis panels and a proof-of-concept for hierarchical reasoning.

13
General-purpose large language models can achieve physician-level accuracy in complex medical data extraction

Rajeev, M.; Narayan, A.

2026-06-10 gastroenterology 10.64898/2026.06.06.26354838 medRxiv
Top 28%
0.4%
Show abstract

Background: Unstructured data represent about 80% of total electronic health records (EHR) data. Structuring this free text is essential for advancing clinical research, including cohort selection for trials, retrospective studies, and the development of disease registries. While manual chart review (MCR) remains the gold standard for extracting this clinical data, the process is inherently slow, resource-intensive, and susceptible to errors from human fatigue. We evaluated the extraction accuracy, safety, and efficiency of the HeLIX (Hepatology Logic-Integrated Extraction) framework, a Large Language Model (LLM) protocol using Google Gemini 3 Pro, compared to a gold-standard Manual Chart Review (MCR). Methods: A prospective validation study was conducted using 50 high-complexity, simulated hepatology discharge summaries designed to replicate the real-world heterogeneity of EHRs. The HeLIX framework employed a Zero-Shot, Structured Chain-of-Thought (CoT) prompting strategy enforced by a three-layer architecture: Clinical Reasoning Trace, Schema Enforcement, and Evidence Verification. The model extracted 45 distinct clinical variables. Performance was benchmarked against a consensus MCR. Results: Across 2,250 evaluated data points, the model achieved an overall Extraction Accuracy of 99.24% (95% CI: 98.8%-99.5%), with perfect concordance in 35/45 (77.8%) variables. For binary diagnostic variables, the model demonstrated an overall F1-score of 0.98, Recall of 0.99 and substantial inter-rater reliability (Cohens {kappa} = 0.97). Hallucinations were exceptionally rare (2/2250; 0.08%). Critical errors affecting clinical management occurred in only 2 instances (<0.1% of total data), both involving etiological misattribution in complex multifactorial diagnoses. The AI workflow was 13.4-fold faster and 95.1% more cost-effective than manual extraction. Conclusion: The HeLIX framework demonstrates physician-level accuracy and reliability in extracting complex hepatology data. It offers a scalable, efficient, and economical alternative to manual chart review. Such frameworks could accelerate clinical research, enabling healthcare systems globally to build comprehensive patient registries for a fraction of the traditional cost.

14
Genetic basis of dynamic brain states reveals cellular and disease associations

Ebneabbasi, A.; Whiteside, D. J.; Gu, Y.; Bethlehem, R. A. I.; Warrier, V.; Rittman, T.

2026-06-12 genetic and genomic medicine 10.64898/2026.06.10.26355409 medRxiv
Top 29%
0.4%
Show abstract

Dynamic resting-state fMRI captures the time-varying patterns of brain activity that are obscured by static approaches. Hidden Markov Models (HMMs) characterise these dynamics as recurring whole-brain states and quantify their fractional occupancy (FO), the proportion of time spent in each state, yet the biological basis of inter-individual variation in FO remains unclear. Using data from 52,335 White UK Biobank participants, with replication in East and South Asian subsamples, this study examined the heritability, cellular and neurotransmitter basis of brain states, and their links with complex phenotypes. FO was significantly heritable and enriched for neuronal populations, particularly glutamatergic and GABAergic signalling. Analyses identified shared and state-specific loci and revealed genetic correlations, colocalisation, and potential causal relationships between FO and several phenotypes, including educational attainment, sleep duration, and disease risk. These findings establish dynamic brain states as biologically grounded intermediate phenotypes, linking genetic variation to neural dynamics, diseases and traits.

15
Decoding the Genetic Architecture of Autistic Traits in the Aging Population

Tian, P.; Rao, X.; Sui, Y.; Gao, S.; Meng, Y.; Han, X.; Wang, T.

2026-06-11 genetic and genomic medicine 10.64898/2026.06.10.26355340 medRxiv
Top 30%
0.4%
Show abstract

Autism research has mostly focused on diagnostic frameworks in childhood. However, autistic traits including social skills, communication, attention switching, attention to detail, and imagination may also vary in many undiagnosed individuals beyond childhood, and the genetic architecture of autistic traits in undiagnosed aging adults remains poorly understood. Here, we performed an exome-wide association study of autistic traits in adults aged >=40 from the UK Biobank (n = 161,269) and independently validated key findings in the SPARK cohort (n = 142,357). We identified exome-wide significance at 17q21.31, represented by a lead variant associated with social skills (rs199533, beta = 0.081, P = 2.04e-11). In addition, we identified an independent signal for communication (rs12632110, beta = 0.042, P = 3.07e-12) and two independent signals for attention switching (rs690733, beta = 0.046, P = 4.26e-12; rs2164272, beta = -0.047, P = 1.73e-12). Gene-based analyses further implicated loss-of-function variation in ZSCAN2 (beta = 1.00, P = 2.44e-6), which was associated with communication differences. Enrichment analyses revealed preferential expression of implicated genes in the cerebral cortex, while phenotypic and neuroimaging analyses linked those variants to cortical brain structure and regional volume. Taken together, these findings delineate the genetic architecture of autistic traits in the aging population and link genetic variation to downstream molecular and neuroanatomical mechanisms.

16
Estimating the effectiveness of syndromic screening at airports for Bundibugyo ebolavirus disease

Quilty, B. J.

2026-06-12 epidemiology 10.64898/2026.06.11.26355442 medRxiv
Top 30%
0.4%
Show abstract

We used a stochastic simulation model to estimate the effectiveness of combined exit and entry airport screening for Bundibugyo ebolavirus disease (BVD), using natural-history parameters from a Bayesian re-analysis of the 2012 Isiro outbreak. For a 12-hour international flight from DRC or Uganda at 86% screening sensitivity, we estimate 65% of infected travellers would arrive undetected (95% CrI: 38 - 76%). The main driver of this outcome is the relative duration of the the incubation period (approximately 7.7 days) and the onset-to-severe-disease interval (approximately 4 days): most infected travellers board before symptom onset and are undetectable by any syndromic screen, whilst those who are symptomatic progress rapidly to illness severe enough to preclude travel. This is compounded during active epidemic growth, when recently exposed (and therefore pre-symptomatic) cases are overrepresented among travellers. Syndromic airport screening offers limited protection against BVD spread via air travel, and should be complemented by outbreak control at source and strengthened clinical surveillance in receiving countries with high travel connectivity to affected areas.

17
Sensor Geometry, Not Signal Processing, Limits Opportunistic Detection of Capillary-Refill-Like Signals by Rule-Based and Language-Model Methods in Archived ICU Waveforms

Landry, T. C.; Kim, Y.

2026-06-09 intensive care and critical care medicine 10.64898/2026.06.07.26355129 medRxiv
Top 31%
0.3%
Show abstract

Background. Capillary refill time is a resuscitation target in septic shock,1-4 but bedside measurement is examiner-dependent. An ICU monitor co-records a photoplethysmogram on the pulse oximeter and intermittent noninvasive blood pressure cuff cycles; if the probe and the cuff share a limb, each cycle is an unplanned vascular occlusion test on the distal microvascular bed. Standard practice places the two on opposite limbs. Objective. To measure how often, in MIMIC-IV-WDB v0.1.0, charted cuff cycles show the photoplethysmographic morphology expected of a same-limb cuff and probe, and to characterize the candidate capillary refill-like signal when that morphology is present. Methods. MIMIC-IV-WDB v0.1.05 was linked to the MIMIC-IV clinical database.6 A pre-registered rule-based detector identified candidate occlusion-reperfusion signatures on the 1-Hz perfusion-index envelope around each charted cuff timestamp. The primary endpoint was the proportion of cuff cycles suitable for analysis that were detector-positive at a 15-second reperfusion threshold, with 95% confidence intervals estimated by resampling patients at a fixed seed. A secondary analysis used a locally hosted multimodal language model (a Gemma-3 derivative on a non-device server) to adjudicate the same signature on perfusion-index plots; no MIMIC-IV-WDB content left the workstation. Results. Of 9,224 charted cuff cycles, 8,909 had a usable pulse-oximeter waveform, and 268 cycles in 15 patients (4.30% of the 6,236 cuff cycles suitable for analysis, 95% CI 2.60 to 6.03) met the primary 15-second threshold. The language model adjudicated the same cycles and called 1,367 of the 8,909 cycles with a usable waveform (15.34%) signature-present, roughly five times the detectors count. Because no laterality ground truth exists, agreement with a single blinded reader served as the comparator rather than accuracy. The two methods were about equally concordant with the reader: precision was 0.25 (95% CI 0.14 to 0.39) for the detector and 0.24 (95% CI 0.10 to 0.35) for the language model, although reweighting to the full population of cycles with a usable waveform lowered the language model to 0.030 (95% CI 0.009 to 0.053). These estimates are reference-limited: a blinded re-read of a 150-card subsample showed only moderate intra-rater reliability (Cohen {kappa} 0.46 to 0.59) with systematic undercalling on the first pass, and rescoring against the corrected re-read roughly doubled precision for both methods. Conclusions. Opportunistic extraction of capillary refill-like signals from archived ICU pulse oximetry is limited in two distinct ways. First, sensor geometry limits how often the signal is recordable: cuff cycles rarely show the morphology expected of a same-limb cuff and probe pair, consistent with opposite-limb placement, so the bottleneck is geometry rather than signal processing. Second, the modest reliability of morphology adjudication limits how well any single flagged cycle can be confirmed: against a blinded reader the detector is a usable screen but a noisy confirmer, the reference is itself only moderately reliable, and the language model is no more concordant despite flagging many more cycles. The minority of cycles in which the morphology appears contain a candidate signal that may merit prospective study under controlled placement with laterality recorded.

18
Influence of comorbid diabetes mellitus on outcomes in multiple sclerosis: an English population-based matched cohort study

Lau, Y.; Zabihi, S.; Hartmann, M.; Mathlin, G.; Banerjee, S.; Marouf, E.; Hadley, C.; Cooper, C.; Dobson, R.

2026-06-10 neurology 10.64898/2026.06.05.26354993 medRxiv
Top 32%
0.3%
Show abstract

Importance: As new treatments increase quality and length of life in people with multiple sclerosis (MS), effective prevention and management of common comorbidities, including Diabetes Mellitus (DM), is increasingly important. Objective: To compare incidence of DM and its associations with hospitalisation and mortality in adults with MS and matched controls. Design: Using English primary care data from the Clinical Practice Research Datalink (CPRD), linked to Hospital Episode Statistics and national mortality records, we matched adults with MS diagnosed between 2000 and 2023, with up to ten controls without MS by age, sex, and practice. We excluded individuals with preexisting DM, defined using diagnostic and management codes. Outcomes included all-cause hospitalisation (number and duration) and mortality. We used Poisson, negative binomial, linear, and Cox proportional hazards models, adjusting for demographic and socioeconomic factors, adding interaction terms to examine if ethnicity, deprivation, and urbanity were associated with outcomes. Results: We included 9,010 individuals with MS and 78,121 matched controls. Over a mean follow-up of 13.2 years, people with MS had over twice the incidence of DM compared with controls (adjusted incidence rate ratio [aIRR]=2.26, 95% CI: 1.96 to 2.61, p<0.001). Among people with MS, incident DM was associated with higher hospitalisation rates (aIRR=1.82, 95%CI: 1.47 to 2.28, p<0.001), longer hospitalisation duration (median 18 vs 4 days, adjusted beta;=0.53, 95%CI: 0.41 to 0.65, p<0.001), and increased all-cause mortality when incident DM was modelled as a time-varying exposure (adjusted hazard ratio=1.46, 95%CI: 1.17 to 1.82, p<0.001), compared to those who did not develop DM. Similar patterns were observed among controls (hospitalisation rates: aIRR = 2.96, 95% CI 2.63 to 3.23, p<0.001; hospitalisation duration: adjusted {beta} = 0.93, 95% CI: 0.86 to 0.99, p<0.001; mortality [time-varying]: HR = 1.50, 95% CI: 1.27 to 1.77, p<0.001). The relationship between DM and increased hospitalisation was stronger in rural areas among those with MS and stronger in White groups among controls. Conclusions: People with MS are more likely to be diagnosed with DM, resulting in greater all-cause hospitalisation and all-cause mortality. This highlights the importance of equitable screening, prevention, and management of DM in people living with MS, with particular attention to geographical health inequalities.

19
Impact of Early Treatment on Symptom Improvement and Procedural Events among Men with BPH and Bothersome Lower Urinary Tract Symptoms: A Contemporary Analysis of the American Urological Association Quality (AQUA) Registry

Ernandez, J.; Najafi, A.; Roehrborn, C. G.; Lerner, L. B.

2026-06-10 urology 10.64898/2026.06.08.26355194 medRxiv
Top 32%
0.3%
Show abstract

PURPOSE: As the armamentarium of BPH therapies continues to expand, it remains imperative to maximize patient satisfaction and minimize decisional regret. We sought to determine the impact of time from BPH diagnosis to index treatment on symptom improvement and subsequent procedural events. MATERIALS AND METHODS: We queried the American Urological Association Quality Registry for men [&ge;] 40 years old with BPH, available IPSS data, and no receipt of prior BPH treatment. Index treatment included medication, surgery, or minimally invasive surgical therapy (MIST). Outcomes included IPSS over 3 years of follow-up, change in percentage of mild lower urinary tract symptoms (LUTS) by 3 months, and time to procedural event. Patients were stratified by time from index diagnosis to treatment by <12 months, 1-3 years, and >3 years. Outcomes were compared across time-to-treatment cohorts with appropriate statistical tests with p < 0.05 as significant. RESULTS: 43,919 patients met criteria with 19,642 pursuing treatments. Patients pursued treatment at comparably lower baseline IPSS compared to prior prospective series. Patients undergoing surgery and MIST had significantly higher baseline IPSS, while medical comorbidities were significantly more common among men initiating pharmacotherapy. Early surgery and MIST were associated with significant improvement in IPSS within 6-12 months and an increase in mild LUTS by 3 months. All forms of early treatment were associated with delayed time to procedural events, including catheterization and fulguration. CONCLUSIONS: Early procedural intervention for BPH is associated with early symptom improvement and delayed time to procedural events among real-world, contemporary practice.

20
PhysiCase: Development and dual-layer validation of synthetic cases for health professional education: A pilot study leveraging Generative AI

Komolafe, O. O.; Roberts, A. C.; Shelley, J.; Tawiah, A. K.

2026-06-09 rehabilitation medicine and physical therapy 10.64898/2026.06.07.26355114 medRxiv
Top 32%
0.3%
Show abstract

High-quality, domain-specific datasets are foundational to advancing educational tools and AI systems in healthcare, yet assembling case repositories from real-world clinical records faces substantial privacy, ethical, and licensing barriers. Synthetic data generation offers a compelling pathway forward, but educational cases require rigorous validation to ensure clinical plausibility and pedagogical utility. This pilot study introduces PhysiCase, a dual-layer validation pipeline for synthetic case generation and evaluates the feasibility of combining automated LLM-based screening with expert educator review. We generated 128 synthetic musculoskeletal(MSK) cases using four frontier large language models (GPT-4.1, GPT-4o, Google Gemini 2.5 Pro, and Llama 4 Scout) across 28 clinical conditions. Cases underwent automated quality screening using an "LLM-as-judge" framework (DeepEval) assessing prompt alignment, JSON correctness, answer relevance, bias, toxicity, and completeness. Ninety cases (70.3%) passed automated filtering and proceeded to expert evaluation by four MSK physiotherapy educators, who rated medical accuracy, realism, fidelity, relevance, and usability on 5-point Likert scales. GPT-4.1 demonstrated the highest automated pass rate (96\%) and strongest expert ratings (medical accuracy 4.10/5, usability 4.38/5), while Llama 4 Scout showed the lowest pass rate (33.3%) and expert ratings. Expert-evaluated cases achieved strong content validity indices for usability (97.5%), relevance (97.5%), and realism (95%), though medical accuracy showed greater variance (CVI 87.5%). Cross-layer correlation analysis revealed that automated completeness metrics moderately aligned with expert usability ratings , while answer relevance and prompt alignment showed weak or negative correlations with clinical correctness. Qualitative analysis identified three primary failure modes: reductive logic, biomechanical inconsistency, and administrative/contextual gaps. The dual-layer validation framework proved methodologically viable: automated screening efficiently reduced expert review burden, while human judgment remained indispensable for detecting subtle clinical reasoning failures. LLM-generated synthetic cases has the potential to meet practical educational needs for MSK physiotherapy, but expert validation is essential to safeguard clinical accuracy. These findings support a scalable division of labour for synthetic case development, with targeted improvements to prompting and automated reasoning checks needed to address identified "nuance gaps." The code for this paper is available on https://github.com/kwid-ai/PhysiCase